M ar 2 01 3 Better subset regression

نویسنده

  • Shifeng Xiong
چکیده

To find efficient screening methods for high dimensional linear regression models, this paper studies the relationship between model fitting and screening performance. Under a sparsity assumption, we show that a subset that includes the true submodel always yields smaller residual sum of squares (i.e., has better model fitting) than all that do not in a general asymptotic setting. This indicates that, for screening important variables, we could follow a “better fitting, better screening” rule, i.e., pick a “better” subset that has better model fitting. To seek such a better subset, we consider the optimization problem associated with best subset regression. An EM algorithm, called orthogonalizing subset screening, and its accelerating version are proposed for searching for the best subset. Although the two algorithms cannot guarantee that a subset they yield is the best, their monotonicity property makes the subset have better model fitting than initial subsets generated by popular screening methods, and thus the subset can have better screening performance asymptotically. Simulation results show that our methods are very competitive in high dimensional variable screening even for finite sample sizes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Bayesian Feature Selection for High Dimensional Linear Regression in Genomics via the Ising Approximation

Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets. Here, we in...

متن کامل

Corrigendum to "Ensembling neural networks: Many could be better than all" [Artificial Intelligence 137 (1-2) (2002) 239-263]

In 2002, we published in Artificial Intelligence an extension [1] of a paper we presented at IJCAI-01 [2]. In Section 2 of the IJCAI-01 paper [2] and in Section 2.1 of the AIJ paper [1], we presented a criterion for selecting a subset of an ensemble of neural networks that could yield better performance than using all members of the ensemble for regression. The fundamental motivation for this c...

متن کامل

ar X iv : 1 30 9 . 25 45 v 2 [ m at h . O C ] 1 M ar 2 01 4 Forbidden vertices

In this work, we introduce and study the forbidden-vertices problem. Given a polytope P and a subset X of its vertices, we study the complexity of linear optimization over the subset of vertices of P that are not contained in X . This problem is closely related to finding the k-best basic solutions to a linear problem. We show that the complexity of the problem changes significantly depending o...

متن کامل

M ar 2 01 0 KAKEYA - TYPE SETS IN FINITE VECTOR SPACES

For a finite vector space V and a non-negative integer r ≤ dim V we estimate the smallest possible size of a subset of V , containing a translate of every rdimensional subspace. In particular, we show that if K ⊆ V is the smallest subset with this property, n denotes the dimension of V , and q is the size of the underlying field, then for r bounded and r < n ≤ rq we have |V \K| = Θ(nq); this im...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013